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Abstract 

The use of bonus-malus systems in compulsory liability automobile insurance is a worldwide 
applied method for premium pricing. If certain assumptions hold, like the conditional Poisson distri- 
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f***^ , bution of the policyholders claim number, then an interesting task is to evaluate the so called claims 

frequency of the individuals. Here we introduce 3 techniques, two is based on the bonus-malus class, 
and the third based on claims history. The article is devoted to choose the method, which fits to 
the frequency parameters the best for certain input parameters. For measuring the goodness-of-fit we 

p\ ■ will use scores, similar to better known divergence measures. The detailed method is also suitable to 

jrt , compare bonus-malus systems in the sense that how much information they contain about drivers. 

Keywords: bonus-malus, claims frequency, Bayesian, scores 
1 INTRODUCTION 

Motivation 

In compulsory liability automobile insurance a worldwide applied method is to calculate the premium of 
the drivers based on a so-called bonus-malus system. This might be vary for different countries, but the 
idea is similar: policy-holders with bad history should pay more than others without accidents in the past 
years. Schematically and mathematically described, there is a graph with certain vertices (they called 
classes), among others an initial vertex, where every new driver begins. After a year without causing any 
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1. Introduction 



accident he or she jumps up to another class, which has a cheaper premium. Otherwise, in case of causing 
an accident, the insured person goes downward, to a new class with higher premium, except he or she 
was already in the worst one. 

Also many other factors are taken into consideration when calculating ones premium, like the engine 
type, purpose of use, habitat, age of the person etc. Here we do not deal with these components, only 
concentrate on the bonus-malus system. (There are many famous books about bonus-malus systems 
where we can read about these factors, e.g. [3].) Namely, our aim is to estimate the expected A number 
of accidents for certain drivers. This is usually called the claims frequency of the policyholder. First we 
review our necessary assumptions, among others about the distribution of claim numbers of a policyholder 
in one year, and the Markovian property of a random walk in such a system. Section 2 will discuss the 
basic problem illustrated with Belgian, Brazilian and Hungarian example. Since also a Bayesian approach 
will be used for estimation of A, a general a priori assumption is also needed. Based on this and a gappy 
information about the driver, the a posteriori expected value of A will be calculated. 

Giving the possible closest estimation for A to the actual one is a crucial task in the insurers life, since 
the expected value of claims, and which is the most important, the claims payments are forecasted using 
A. Let us note that for evaluating the size of payments on the part of the insurer, the size of the certain 
property damages has to be approximated, not even the number of them. Here we will not care for it in 
this article. 

Bonus-malus systems 

We suppose the following assumptions, however, it is surely a simplification of the reality: 
Assumptions 1.1 • A is a constant value in time 1 

• the random walk on the graph of classes is a homogeneous Markov chain, i.e. the next step depends 
only on the last state, and independent of time 

• the distribution of a policyholders claim number for a year is Poisson(X) distributed 

Notation 1.2 For a bonus-malus system consisting of n classes, let C\ denote the worst one with the 
highest premium, C*2 the second worst etc., and finally C n the best premium class which can be achieved 
by a driver. On the other hand, let B t be the class of the policy-holder after t steps (years). 

Using these notations, the Markovian property can be written as P(B t = Ci\B t -\, . . . , B\) = P{B t = 
Ci\B t -i). As the B t process is supposed to be homogeneous, it is correct to simply write pij instead of 
P(B n+ \ = j\B n = i). These values specify annxn stochastic matrix with non-negative elements, namely 
the transition probability matrix of the random walk on states. Let us denote it by M(A). Now to be more 
specific, we outline the example of the three different systems, the Belgian, Brazilian and Hungarian. 

Example 1.3 (Hungarian system) In the Hungarian bonus-malus system there are 15 premium classes, 
namely an initial (Aq), 4 malus (M^, . . . , M\) and 10 bonus (Bx,...,B\q) classes". In our above men- 
tioned terminology, we can think of it as C\ = M4, . . . , C4 = M\,C^ — Aq,C§ = Si, . . . , C15 = -Bio- 
After every claim-free year the policyholder jumps one step up, unless he or she was in Bio, when there 
is no better class to go. The consequence of every reported damage is 2 classes relapse, and at least 4 
damages pulls the driver back to the worst M± state. Thus the transition probability matrix takes the form 
of Equation 7.1, see Appendix. 



The case of the variable A in time can be handled using double stochastic processes (Cox processes for instance), i.e. 
A(t) would also be a stochastic process. 

2 We have to be careful here not to confuse the class index of _B, with the time index! 
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Example 1.4 (Brazilian system) 7 premium classes: Aq, Bi,B2, ■ ■ ■ ,B§. Sometimes written as 7, 6, 
5, . . ., 1 classes, e.g. in Jean Lemaire's article [4]- Transition rules can be found in the cited article. 

Example 1.5 (Belgian system) Here we look at the new Belgian system introduced in 1992, and within 
this we focus on business-users. The only difference between them and pleasure-users is the initial class. 
Transition rules can also be found in article [4]- There are 23 premium classes: M%, Mj, . . ., Mi, Aq, 
B\, B2, . . . , -B14 (sometimes written as 23, 22, 21, . . . , 2, 1 classes). 

2 THE BAYESIAN APPROACH 

Our realistic problem is the following. When an insured person changes insurance institution, the new 
company not necessarily gets his or her claim history, only the class where his or her life has to keep 
going. The new insurer also knows the number of years the policyholder has spent in the liability insurance 
system. Nevertheless, based on this two information we would like to provide the best possible estimation 
for the certain persons A. First of all we need some new notations. 

Notation 2.1 B t — c denotes the event that the investigated policyholder has spent t years in the system 
(more precisely, from the initial class he or she has taken t steps), and arrived in class c. 

Notation 2.2 ttq is the initial discrete distribution on the graph, which is a row vector of the form 

(0, . . . , 0, 1 , 0, . . . , 0). (Every driver begins in the initial C{ state.) 
it h 

As a Bayesian approach, suppose that claims frequency is also a random variable, and denote it by A. 
In practice, the gamma mixing distribution seems to be a right choice for that, therefore only this case 
will be discussed now. For other cases, the calculations can be similarly, though not similarly nicely done. 

Notation 2.3 T(a,f$) is the gamma distribution with a shape and f3 scale parameters. (Defining the 
scale parameter precisely, the density function is f(x) = £ \ e .) 

For some fixed A = A the driver causes X ^Poisson(A) property damages, i.e. P{X = fc|A = A) = 
^e - ^. It is well known that the unconditional distribution will be negative binomial with parameters 
( a i j+g ) m our notations. In view of this the a priori parameters can be estimated by a standard 
momentum or maximum likelihood method. 

Remark 2.4 It is certainly necessary to make hypothesis testing after all, because our assumptions about 
the mixing distribution might be wrong. In this negative-binomial-rejecting case we have two important 
alternatives. They are to be found among others in [1], for instance. On the one hand we can try inverse 
Gaussian, and on the other hand lognormal distribution for mixing. 

Estimation of distribution parameters 

In this subsection we give an estimation method to compute the approximate values of a and f3 parameters. 

Let us suppose that the insurance company has claim statistics from the past few years containing n 

policyholders. The ith insured person caused Xi accidents by his or her fault over a time period of 

ti years, where ti is not necessarily an integer. According to our assumption, the distribution of Xi is 

Poisson(ti • O), where ti is a personal time factor and 8 is a Gammafa, /^-distributed random variable. 

The unconditional distribution of Xi as mentioned above is Negative Binomial I a, -f+g), thus its first 

two moments are 

cv. 
EXi=ti- (2.1) 



2 



3. Other methods for frequency estimation 



Now we construct a method of moments estimation, which is not obvious how to build. For example, on 

the one hand, we can say that V EXi = f E *i> thus f = i=l — _ On the other hand, V ^^- = n%, 

i=i p i=\ p £ U i= i * p 

thus ^ = *=i- — . Generally they do not give the same results, except ti = £2 = • • ■ = tn- Based on our 
simulations, we used the most accurate method, where the approximation of parameters are the result of 



the following equations. 
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Finally we have to notice the maximum likelihood method as another obvious solution for this parameter 
estimation. Unfortunately, the likelihood function generally has no maximum, hence this method has 
been rejected. 

Conditional probabilities 

Now we have the a priori a, /3 parameters, and the B t = c information with the initial class = Bq. 
According to Bayes theorem, the conditional density of A is 

P(B t - c|A = A) • / A (A) 



/A|B t =c(A|c) = 



/ P(B t = C |A = A) • / A (A)dA 

o 



and the estimation for A is the a posteriori expected value 



A = / A • / A |B t=c (A|c)dA. 
o 

Unfortunately, P(B t — c|A = A) presents some difficulty, because it only can be calculated pointwise as 
a function of A, powering the M(X) transition matrix. If / denotes the index of the initial class in the 
graph, this probability is exactly the M(X) t , I , c ,s element of the matrix, where \c\ is the index of class c. 
Using numerical integration it can be solved relatively fast. If we have a glance at figure 2.1, we might see 
the claim frequency estimations for different countries and different a and /? parameters. For example, 
the first figure 2.1a was made based on the Brazilian bonus-malus system with parameters a = 1.2 and 
fi = 19, which indicates a relatively high risk portfolio. There is a line in the chart for each bonus class, 
which values show the estimated A frequencies in function of time. We also see that these functions 
stationarize in more decades, thus the information about in the system elapsed time of the individual is 
important. 

We will refer to this method as method. 1 later. 

3 OTHER METHODS FOR FREQUENCY ESTIMATION 

In this section we shortly introduce 2 other (known) possible methods to evaluate policyholders' claim 
frequencies. Our assumptions for the distribution of claim numbers certainly still hold. 
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Figure 2.1: Estimated A parameters for different countries and a,/3 parameters on a time horizon of 30 
years. (Note that some points in the colored version are not shown in these figures. For every class, charts 
start at points, from where the probability of being there is positive.) 
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Average claim numbers of classes 

The following method is probably the simpliest, and we will refer to this as method. 2. Let us suppose that 
statistics from the last year claim numbers are available. Take the average caused claims for every bonus 
classes, i.e., if last year our portfolio contained 5 policyholders each in class C with claims 0,1,0, 0, 2, then 
the estimation for insured people's frequencies present year in class C is Ac = 0.6. Although it seems to 
be too simplified, in some cases it gives the best results in certain circumstances. 

Claim history of individuals 

At last the third method is classical, and will be referred as method. 3. Here we use the insured persons 
claim history, i.e., suppose that he or she was insured by our company for t years, and the distribution of 
claim numbers for each year is Poisson(A). Let X denote the number of aggregate claims caused by this 
person in his t-year-long presence in the system, more precisely, in our field of vision. Then the conditional 
distribution of X is also Poisson with parameter At. Let us remark that t does not have to be an integer 
(people often change insurer in the middle of the year in most countries). Based on this, our estimation 
on a certain A is the conditional expected value of the Gamma distributed A on condition X = x, i.e., as 
well known, A = E(A\X = x) = f±f . 

This is also a Bayesian approach as in the case of method. 1, but the condition is different. Besides 
notice that first the a and (3 parameters have to be estimated exactly the way we have seen it before in 
Section 2. 

4 COMPARISON USING SCORES 

The aim of this section is to make a decision which method gives the possible most accurate estimation 
for claims frequencies. In this context, we will use the theory of so called scores. For much more detailed 
information see article [2], as here we will discuss only the most important properties useful for our 
problem. 

Scores are made for measuring the accuracy of probabilistic forecasts, i.e., measuring the goodness- 
of-fit of our evaluations. Let Q, be a sample space, A a c-algebra of subsets of fl and V is a family of 
probability measures on (CI, A). Let a scoring rule be a function S : V x il — > R = [—00,00]. We work 
with the expected score S(P, Q) — J S(P,uj)dQ(uj), where measure P is our estimation and Q is the real 
one. Obviously one of the most important properties of this function is the inequality S(Q, Q) > S(P, Q) 
for all P,Q^V. In this case S is proper relative to V . On the other hand, we call S regular relative to 
class P, if it is real valued, except possibly it can be —00 in case of P 7^ Q. If these two properties hold, 
then the associated divergence function is d(P, Q) = S(Q, Q) — S(P, Q). 

Here we mention two main scoring rules, which will be used in sections below for comparing our 
evaluation methods. Remember that for an individual, the conditional distribution of the number of 
accidents is Poisson, so in our notations let pi (i = 0, 1, 2, . . .) be P(X = i|A = A), i.e., the probabilities 
of certain claim numbers using the estimated A as condition. Similarly, q{ (i = 0, 1,2, . . .) is the same 
probability, but for the real A frequency. 

Brier Score 

Sometimes called quadratic score, as the associated Bregman divergence is d(p,q) = XXp« ~ ' qi) 2 ■ The 

i 

score is defined as 

(4.1) 



S(P,Q) = 2(^ Piq A-(j2 P 2 )-l. 
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Here we use deliberately the unknown qi probabilities. Although, in practice they are not know, in our 
simulations we are able to make decisions using them. If we want to calculate the score of our estimation 
subsequently, we change qiS to a Dirac-delta depending on the number of claims caused. In other words, 
if the examined policyholder had i claims last year, then the corresponding score is 

Logarithmic Score 

The logarithmic score is defined as 

S(P,Q) = Y,<lilogPi. (4.2) 

i 

(In case of i caused accidents S(P,i) — \ogpi.) We mention that the associated Bregman divergence is 
the Kullback-Leibler divergence d(p, q) = ^ qi log — . 

i 
5 SIMULATION AND RESULTS 

Using R program we simulated portfolios for testing our methods the following way. This can be used for 
frequency evaluation in practice, if we have the appropriate inputs. First we need a portfolio containing N 
insured people, which is used to estimate the a and (3 parameters of the negative binomial distribution. 
We think of it as the policyholders' histories available in the insurers database. Taking advantage of 
the whole claim and bonus-malus history, we have done it exactly the way as described in Section 2. 
In the next step, we generated the history of an M policyholder-containing portfolio, assuming that the 
distribution parameters are unchanged compare to the first portfolio. This might result some bias, but 
this is the best we can do based on our available data. 

After that, we estimated the claim frequency parameters of policyholders separately based on the 
three estimation methods described above, and compared them to the real A parameters using scores. 
Our aim is to make a decision among the methods, and decide, which would give the best fit results in 
certain cases. In other words, for certain input parameters, which method gives good-fit estimations in 
expected value, where the higher score value means the better goodness-of-fit. Before discussing results, 
we better stop for a few remarks. 

As we are interested in the expected values of scores for certain inputs, we will apply a Monte-Carlo- 
type technique. This means that we generate the above mentioned two portfolios n times independently, 
but in each first ones preserving the a and /3 distribution inputs. After that, based on the approximated 
a and /3, we estimate the A parameters of the second portfolios. Each method gives one score number 
for each simulation, which is the average of scores calculated for individuals. (Of course aggregate scores 
would be equally appropriate, since it differs only in an M multiplier from the average.) At last we take 
the mean of mean scores, and the higher score resulting method is the better. 

Remark 5.1 For reference we will write and plot the score results also for comparing the real frequencies 
to the real frequencies, since these scores are not equal to 0, as in the divergence case. In function of year 
steps, these scores should be constant, contrary to the charts below, where small differences can be observed. 
The reason is that in the Monte- Carlo simulations we generated also the A parameters over and over. 

Remark 5.2 In our simulations input parameters are 

1. N the number of policyholders in the first portfolio, which is used to estimate the a and (3 parameters. 

2. M the number of policyholders in the second portfolio. This contains individuals, whose A parameters 
have to be evaluated. In practice, there might be an overlap between these two files. 
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3. Real a and j3 distribution parameters. 

4- Number of year steps. This means the time elapsed in years, since the certain individual is insured 
by our company. Note that it implies the knowledge of claims history and bonus classification, thus 
it is a very important feature, as it affects the goodness-of-fit of our estimation methods. 

5. Number of years elapsed before entering our company. This affects method. 1 and method. 2, because 
the Markov chain on the bonus classes converges slowly to the stationary distribution. 

6. Transition rules of the examined country. 

7. Number of simulated portfolios. As we approximate the scores via Monte- Carlo-type technique, it 
has to be large enough. 

Figure 5.1 shows an example. We simulated 50 times a 80 thousand-people-containing portfolio, esti- 
mated a and f3 parameters, then estimated the A parameters of 20 thousand policyholders. After that we 
set the results of the three methods against the real frequencies using scores. For every simulation and 
country we got 2 scores, the Brier score and Log score. The points of the charts are the average scores for 
this 50 simulations for certain year steps. Certain year steps mean that we generated the so called second 
portfolios (the current we are analysing) as we had information about policyholders 1, 2, 5, 10, 15 and 20 
years back, respectively. Intermediate points are approximated linearly. Note that standard deviation of 
the sample of Brier scores in the Hungarian example are under 0.0009, and under 0.0016 in case of Log 
scores. 

Remark 5.3 In practice, there are different lengths of claim histories available for different policyholders. 
In function of this length, we can decide that the parameters of a group of insured people will be evaluated 
using method. 2, and the rest using method. 3, for example. 

Remark 5.4 Here (5.1) we let the random walks of policyholders on the systems run for 15 years, thus 
our observations start at that point, i.e., year steps start at that time. 

The example clearly shows the differentiation capability of systems containing more bonus-malus 
classes. In other words, the more classes the system has, the more years needed for method. 3 to get the 
start of method. 2. Here the Bayesian type method. 1 is the worst in every case, but we shall not forget 
that in certain circumstances it can be useful. For example, if the claim history is largely deficient. 

On the tested parameters the two types of scores gave almost the same results (difference is not 
significant), what we have been expecting. Same means that if method. x is the best according to Brier 
scores, then it is the best according to Log scores, too. 

6 CONCLUSIONS 

In this article we introduced the principle of bonus-malus systems, and the necessary assumptions about 
the distribution of claim numbers of policyholders, inter alia that the X number of claims caused in a 
year by an insured person is conditionally Poisson distributed. The goal was to evaluate these A frequency 
parameters, which is the expected number if claims caused, and we did not deal with the size of them. 
Since the unconditional distribution is negative binomial, we can simply evaluate the shape and scale 
parameters based on the insurers claim history from past years. Though the current portfolio might have 
some different parameters, this file is the best we can use. 

We introduced 3 methods for frequency estimation, where one was never used by actuaries to our 
knowledge, and the other two are known. The main aim of this article was to decide which method is 
the most appropriate in certain circumstances, i.e., for given parameters. Our decision is made based 
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Figure 5.1: Mean Brier and Log scores in the Brazilian, Hungarian and Belgian system, when TV = 80000, 
M = 20000, real distribution parameters a = 1.2 and j3 ~ 14, and 50 portfolio simulation. 
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on scores, which are devoted to measure the bias of two distributions. If method. x gives Ai,...,Am 
frequency parameters, and the real ones are Ai, . . . , Am, then method. x is the best choice among the 
other methods, if the average score is greater than in the other cases. The discussion implies a Monte- 
Carlo-type algorithm, which can be used in practice to make decisions. 

At last, but not least, our method includes a technique, which is suitable to compare bonus-malus 
systems in the following sense. In function of years, the longer the method. 2 is better than others, the 
more informative is the system, as we expect more accurate evaluation of claims frequencies using the 
past years average claim numbers in different classes, than using other methods. For parameters chosen 
in example 5.1, in the Brazilian system, method. 3 based on claims history becomes the most appropriate 
in the second year, while in the Hungarian system it needs 7-8, and in the Belgian 16-17 years. 
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7.1 shows the transition probability matrix of the Hungarian bonus-malus system. 



M(A) = 



l-e~ A 

l-e~ A 

l-e~ A 

l-e~ A 

l-(A + l)e- A 

l-(A + l)e- A 

l-(A 2 /2! + A+l)e 

l-(A 2 /2! + A+l)e- A 

l-(A 3 /3! + A 2 /2! + A + l)e 

l-(A 3 /3! + A 2 /2! + A + l)e- 
y l-(A 3 /3! + A 2 /2! + A + l) 
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A 2 -e- A /2! 
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(7.1) 
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